Quantitative Protein Levels: Healthy Tissue in HPA & GTEx

Portfolio targets

Author

Target Sciences

Published

13 February 2026

Department: Therapeutics (Target Sciences)

Department: Therapeutics (Target Sciences)
Code
knitr::read_chunk("01b-frontMatter.R")
Code
knitr::opts_chunk$set(
  fig.width = 12, 
  fig.height = 8, 
  fig.path = "markdown_figs/", 
  dev = "png", 
  eval = TRUE, 
  echo = FALSE, 
  warning = FALSE, 
  message = FALSE, 
  tidy = FALSE
)
## This switch allows for document-type dependent output (e.g. interactive graphs) https://trinkerrstuff.wordpress.com/2014/11/18/rmarkdown-alter-action-depending-on-document/
document_output_type <- knitr::opts_knit$get("rmarkdown.pandoc.to")
# print(document_output_type)

1 Aim

To identify tissues at risk of on-target off-tumour toxicity.

2 Results

A quantitative proteome and transcriptome abundance atlas of 29 paired healthy human tissues from the Human Protein Atlas project representing human genes by 18,072 transcripts and 13,640 proteins. N.B. No replicates per tissue.

The distribution of target protein expression across a comprehensive panel of healthy human tissues assayed by GTEx. Expression level reflects the protein expression level as measured by tandem mass tag (TMT) 10plex/MS3 mass spectrometry, measured in 201 GTEx samples from 32 different tissue types of 14 normal individuals.

A systematic quantification of 10,841 unique proteins from 720 GTEx samples, representing five human tissues (sigmoid colon, heart left ventricle, liver, lung and thyroid).

3 Methods

3.1 Human Protein Atlas

Re-processed data1 for 29 healthy tissues was downloaded from the Expression values across all genes tab at:

  • https://www.ebi.ac.uk/gxa/experiments/E-PROT-29/Downloads

The associated abstract is reproduced below:

“we generated a quantitative proteome and transcriptome abundance atlas of 29 paired healthy human tissues from the Human Protein Atlas project representing human genes by 18,072 transcripts and 13,640 proteins including 37 without prior protein-level evidence. The analysis revealed that hundreds of proteins, particularly in testis, could not be detected even for highly expressed mRNAs, that few proteins show tissue-specific expression, that strong differences between mRNA and protein quantities within and across tissues exist and that protein expression is often more stable across tissues than that of transcripts. Only 238 of 9,848 amino acid variants found by exome sequencing could be confidently detected at the protein level showing that proteogenomics remains challenging, needs better computational methods and requires rigorous validation. Many uses of this resource can be envisaged including the study of gene/protein expression regulation and biomarker specificity evaluation.”

3.2 GTEx

Relative protein abundance in 900 GTEx tissues, quantified using TMT-LC-MS through two separate studies was retrieved from:

  • https://gtexportal.org/home/downloads/egtex/proteomics

Fang et al

This normalized protein abundance data represents tandem mass tags (TMT)-based mass-spectrometry (MS3) quantification of the relative protein abundance of 720 GTEx samples from five GTEx tissues2.

Jiang et al

For reference, this table is also deposited here:

  • Table S2. C. Normalized protein abundances (transformed back to the absolute abundances)
    https://www.cell.com/cms/10.1016/j.cell.2020.08.036/attachment/0d79a576-f9b7-4342-81a8-f80c14df0372/mmc3.xlsx

This dataset represents protein and RNA expression levels in 32 normal human tissues from 14 individuals3:

“The Genotype-Tissue Expression (GTEx) project collected samples from 54 tissues of 948 post-mortem donors and characterized their transcriptomes (Carithers et al., 2015; GTEx Consortium, 2015; Project and eGTEx Project, 2017). For this study we quantitatively profiled the proteome of 201 GTEx samples from 32 different tissue types of 14 normal individuals (Figure 1A), covering all major organs (Table S1). The proteome data were acquired with a tandem mass tag (TMT) 10plex/MS3 mass spectrometry strategy (Figure 1B), which enables 10 isotopically labeled samples to be analyzed in a single experiment (McAlister et al., 2014). To increase the proteome coverage, each TMT 10plex sample was extensively fractionated (Figure 1B). To facilitate cross-tissue comparison and to reduce the influence of technical variation between mass spectrometry runs, we randomized the tissue samples such that each TMT 10plex consists of an assortment of tissues and a reference sample.”

Normalised abundance data was retrieved. This was generated accordingly:

“Since there are two reference replicate samples in each run, batch effects were removed by using the relative abundance of each sample to the average of the reference samples. NAs in the reference channels (126, 131 channels) were imputed using a minimum value of 15. The relative abundance of each sample was logarithm transformed at base 2. Different from traditional case-control study or a study with a few conditions, our samples were from 32 different types of tissues, which are highly heterogeneous. The majority of previous normalization methods cannot guarantee a robust and tissue-sample adaptive correction. Here, we applied our data-driven robust normalization method (RobNorm) which took into account sample heterogeneities (Wang et al., 2019b). To robustly estimate the sample effects, we implemented the density-power-weight to down weigh the outliers for the structured data. Our algorithm automatically detected the sample inliers (stable abundances) which were used for the robust normalization and at the same time kept the genuine heterogeneities from outliers. To avoid the bias from missing values, the estimation for sample effects was based on the genes with less than 50% missing values, in total, 6,320 genes. We set density power parameter γ = 1, and took zero vectors as the standard sample to implement RobNorm (Wang et al., 2019b). The sample effects were then corrected for all the genes on the relative abundances in logarithm scale. After normalization, the log ratio values were transformed back to the absolute abundances for missing value imputation in the next step.”

Please note: “Membrane proteins were underdetected across the entire RNA expression abundance level; of the 5,500 predicted membrane-bound proteins and 3,000 secreted proteins (Uhlén et al., 2015), we detected 3,143 membrane and 1,848 secreted proteins, respectively”

4 R session details

Analysis was performed using R (ver. 4.5.1) and the following additional packages:

Packages used (continued below)
  Package Version
ggplot2 ggplot2 4.0.0
RColorBrewer RColorBrewer 1.1-3
sysfonts sysfonts 0.8.9
  Author
ggplot2 Hadley Wickham [aut] (ORCID: https://orcid.org/0000-0003-4757-117X), Winston Chang [aut] (ORCID: https://orcid.org/0000-0002-1576-2126), Lionel Henry [aut], Thomas Lin Pedersen [aut, cre] (ORCID: https://orcid.org/0000-0002-5147-4711), Kohske Takahashi [aut], Claus Wilke [aut] (ORCID: https://orcid.org/0000-0002-7470-9261), Kara Woo [aut] (ORCID: https://orcid.org/0000-0002-5125-4188), Hiroaki Yutani [aut] (ORCID: https://orcid.org/0000-0002-3385-7233), Dewey Dunnington [aut] (ORCID: https://orcid.org/0000-0002-9415-4582), Teun van den Brand [aut] (ORCID: https://orcid.org/0000-0002-9335-7468), Posit, PBC [cph, fnd] (ROR: https://ror.org/03wc8by49)
RColorBrewer Erich Neuwirth [aut, cre]
sysfonts Yixuan Qiu and authors/contributors of the included fonts. See file AUTHORS for details.
R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_United Kingdom.utf8 
[2] LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_4.0.0      sysfonts_0.8.9     RColorBrewer_1.1-3

loaded via a namespace (and not attached):
 [1] sass_0.4.10       generics_0.1.4    digest_0.6.37     magrittr_2.0.4   
 [5] evaluate_1.0.5    grid_4.5.1        showtextdb_3.0    fastmap_1.2.0    
 [9] jsonlite_2.0.0    processx_3.8.6    backports_1.5.0   secretbase_1.0.5 
[13] ps_1.9.1          pander_0.6.6      crosstalk_1.2.2   scales_1.4.0     
[17] codetools_0.2-20  jquerylib_0.1.4   cli_3.6.5         rlang_1.1.6      
[21] withr_3.0.2       cachem_1.1.0      yaml_2.3.10       tools_4.5.1      
[25] dplyr_1.1.4       base64url_1.4     DT_0.34.0         showtext_0.9-7   
[29] curl_7.0.0        vctrs_0.6.5       R6_2.6.1          lifecycle_1.0.4  
[33] htmlwidgets_1.6.4 targets_1.11.4    pkgconfig_2.0.3   callr_3.7.6      
[37] pillar_1.11.1     bslib_0.10.0      gtable_0.3.6      glue_1.8.0       
[41] data.table_1.17.8 Rcpp_1.1.0        xfun_0.54         tibble_3.3.0     
[45] tidyselect_1.2.1  rstudioapi_0.17.1 knitr_1.50        farver_2.1.2     
[49] htmltools_0.5.8.1 igraph_2.2.1      rmarkdown_2.30    compiler_4.5.1   
[53] prettyunits_1.2.0 S7_0.2.0         

5 References

1.
Prakash, A. et al. Integrated view of baseline protein expression in human tissues. Journal of proteome research 22, 729–742 (2023).
2.
Fang, H. et al. Regulation of protein abundance in normal human tissues. medRxiv : the preprint server for health sciences (2025). doi:10.1101/2025.01.10.25320181
3.
Jiang, L. et al. A quantitative proteome map of the human body. Cell 183, 269–283.e19 (2020).